Introduction

This project looks at height, weight, nationality, and position of the 2022-23 roster of NHL players. The source of the data was from Kaggle. Through analyses, the goal is to better understand how NHL teams are structured and what characteristics are important to different positions.

Preprocessing

The NHL player dataset, obtained from a CSV file, was pulled from Kaggle and processed to aid in clarity and function regarding the data that was analyzed. The position variable was changed from single letters to the full name. The region variable was created in order to compare which areas represented the player population best. The height variable was originally a character data type, but was changed to a numeric type and represented height in inches. This made it easier for data wrangling and visualization. Ninety-three entries came up missing, so they were recorded as NA values, which only represented 3% of the data.

General Analysis of NHL Player Data

Nationality

##    nationality    n percentage
## 1          CAN 1346 45.7667460
## 2          USA  757 25.7395444
## 3          SWE  249  8.4665080
## 4          FIN  148  5.0323019
## 5          RUS  147  4.9982999
## 6          CZE  123  4.1822509
## 7          SVK   37  1.2580755
## 8          CHE   33  1.1220673
## 9          DEU   27  0.9180551
## 10         DNK   17  0.5780347

The nationality of current NHL players is dominated by North American and European players. The top two are Canada and the United States.

Height of Players

##   mean_height median_height       SD min max Q1 Q3
## 1    73.15573            73 2.167217  64  81 72 75

The average height for professional hockey players is around 73 inches (6’1”) with a variance of around 2 inches. It is not surprising that most players are clustered around the mean height due to the physicality of the game.

The histogram provides a clear picture of how symmetrical the distribution of height is among professional hockey players.

Weight of Players

##   mean_weight median_weight       SD min max  Q1  Q3
## 1    199.1054           198 16.10513 140 265 188 210

NHL players have a mean weight of around 200 pounds, as demonstrated by the weight distribution. This weight echoes the physical strength required for players in the National Hockey League. This is because weight can vary significantly depending on a player’s position and playing style.

Player Positions

Height by Position

Weight by Position

Defensemen and goalies show the tallest averages, which indicates that reach and physical size are an important part of their role on the roster. Defensemen are also heavier than the average forward. Goalies tend to be lean, which implies that reach and agility are the main physical characteristics for optimal performance. Forwards typically are the players who contribute points. With a smaller and lighter frame, it would be easier to maintain higher speeds, faster acceleration, and be more agile than the average defenseman.

Differences by Region

Height by Position and Region

When examining height distributions across different positions and regions, intriguing patterns appear. North American players have consistent height distributions in all positions, mirroring the well-established player development that is a part of Canada and the United States’ youth programs. European players also show similar height tendencies, particularly in forward positions, although some regional deviations exist among defensemen. Russian players tend to cluster within similar height ranges across positions.

Central Limit Theorem Analysis

The sampling distribution of mean heights shows a clear normal distribution. The distribution is aligned around the population mean, which is approximately 73 inches.

Sampling Distribution of Weight

The weight sampling distribution is normal, which demonstrates the relevance of the Central Limit Theorem. The slim and concentrated nature makes a good example of how sample mean variability is neutralized.

This visualization illustrates a crucial aspect of the Central Limit Theorem: as the sample size increases, the sampling distribution becomes narrower and more normal. For instance, when n=5, the distribution is wider and is not as smooth. However, as n increases to 15, 30, and 60, the distributions progressively narrow and approach a perfect normal shape. This demonstrates that larger samples yield more precise estimates of the population mean with reduced sampling variability.

The comparison of the theoretical normal distribution and the actual distribution of height in the NHL confirms that the Central Limit Theorem is in play with this data.

Sampling Methods

The grouped bar chart effectively highlights how well stratified sampling works for this data based on the different population sizes of each position. The visualization demonstrates how stratified sampling is great for categorical data that calls for maintaining strict subgroup proportions. Although the population size for some positions was limited, simple random sampling without replacement and systematic sampling revealed a clear distribution pattern among defensemen, goalies, and forwards.

Conclusion

The missing 3% of data, in my opinion, did not affect the quality of the data in a dramatic way. The limited position pools did, however, have an impact on the results, and completeness of data should be prioritized in future analysis. The mean height ranges from 6’1” to 6’5”, and the mean weight of roughly 200 pounds indicates a distribution that is mostly normal, with weight being more in flux based on the physical demands of each position. Defenseman and goalie performance seems to be focused on size and reach, while center and wing positions are related to agility and speed. Even though the NHL has a global audience, North America dominates the player pool, with the top 10 countries accounting for over 95% of NHL players. Although my data did not relate to any scoring statistics, trends in professional hockey are showing prioritization for smaller players with high agility and dynamic scoring ability. High speed and high scoring!